Is there a CPU? Unlock AI speed and flexibility through algorithms and virtualization | Market Filter

2021-11-04 03:44:35 By : Ms. Serena Ruilon

The current AI industry assumes that CPUs are inferior to GPUs and other specialized processors (such as TPUs) in heavyweight AI computing. The popular algorithms developed in the 1980s for training neural networks are essentially a series of matrix multiplications. Matrix multiplication is one of the few special operations in which the GPU (or TPU) can take advantage of the advantageous memory access mode to use thousands of cores to perform calculations much faster than the CPU.

With the modern advancement of AI training, it has become increasingly obvious that complete matrix calculations are too much for large models. Performing innovative selective calculations (or sparse adaptive operations) is expected to be a more effective alternative to full matrix multiplication. However, the overhead associated with adaptive sparse selection and cache-unfriendly memory access patterns make existing ideas very slow. Unpredictable memory access loses most of the attractive boost provided by professional hardware. As a result, the community is still stuck in the wasteful algorithms used to train neural networks in the 1980s, hoping that hardware acceleration can be expanded.

Enter ThirdAI's brain-like high-efficiency algorithm

At ThirdAI, we have found how to use probabilistic data structures to design super-efficient brain-like "associated memories". These memories can implement selective sparse calculations—similar to sparse coding in the brain—to effectively train neural networks. The resulting implementation is called BOLT (Big O'l layer training). While achieving the same accuracy, the amount of calculation required to train the neural network is exponentially reduced.

The BOLT algorithm achieves neural network training with 1% or less FLOPS, which is different from standard techniques such as quantization, pruning, and structured sparseness, which provide only slight constant factor improvements. Therefore, we don't have to rely on any special instructions, and acceleration will naturally be observed on any commodity CPU, including Intel, AMD or ARM. Even older versions of commodity CPUs can train a billion parameter models faster than A100 GPUs. (Learn more about technology and benchmarks.)

ThirdAI's software accelerator is implemented in basic C, with ready-made Python bindings. Therefore, the power of the ThirdAI algorithm can be embodied in any workflow, whether it is TensorFlow, PyTorch or any other workflow. The software does not rely on special instructions, so it can be easily accelerated on any CPU architecture.

From data center to edge: popularize artificial intelligence training

CPU is readily available-the most consistent, cheapest, and fully virtualized computing option in existence today. The advantage of using ThirdAI's BOLT to enable the CPU to achieve top AI performance cannot be underestimated. It provides acceleration and availability. Hardware acceleration is expensive and requires major changes to the infrastructure. BOLT on the CPU enables everyone to use AI without the need for costly infrastructure changes.

The efficiency of ThirdAI's algorithm (it requires only a few CPU cores to achieve GPU-like acceleration) can even be used for AI training on edge devices. This capability may change the overall economics of AI and IoT, which currently assume that AI training is the job of the cloud.

Virtualization with Radium: One-click AI acceleration, all in one place

Where dedicated acceleration is involved, application and infrastructure management can become confusing. The variety of AI acceleration options available causes end users to face various integration challenges. VMware's Project Radium provides a virtualization solution for AI/ML applications without the need for decades of system support and ecosystem evolution. Radium will not only unify different hardware accelerators, but also support software acceleration backends. Through the BOLT integration of Radium and ThirdAI, we will see significant AI acceleration on CPU-only servers, providing significant acceleration for existing infrastructure without the need for hardware accelerators.

The integration of Radium and ThirdAI's software backend will automatically convert existing CPUs (whether old or new) into AI powerhouses. The solution is the obvious first step before investing in any expensive infrastructure changes. Radium's seamless developer experience combined with ThirdAI's BOLT brings high-performance ML training to various environments from cloud to remote. It supports everything from high-end multi-CPU servers to low-end Raspberry Pi-level systems. Our CPU drive method will also help ensure that high-performance ML is provided to everyone in the current global chip shortage.

VMware Inc. published this content on November 3, 2021 and assumes full responsibility for the information contained therein. Distributed by the public at 20:31:08 on November 3, 2021, UTC time, unedited and unaltered.